‎

Table of Contents

3.1.0 vs 3.0.4
V 3.0.4 vs 3.0.3
V 3.0.3 vs 3.0.2
V 3.0.2 vs 3.0.1
V 3.0.1 vs 3.0
V 3.0 vs 2.0
V 2.0
V 1.2
V 1.1

3.1.0 vs 3.0.4

entirely rewritten forward and backward substitution for better parallelism and use of GPUs (useful in case of multiple RHS)
new CUDA kernels for panel reduction which drastically improve performance
added a routine to compute A^TA and benchmark for normal equations
multiple bugfixes and minor improvements

V 3.0.4 vs 3.0.3

improved cmake install scripts and procedure

V 3.0.3 vs 3.0.2

minor bugfixes: incoherent passing of arguments in block_axpy and block_copy tasks, in spmat_mv and in FindAMD

V 3.0.2 vs 3.0.1

Raise an error in potrf if matrix is indefinite
better handling of transposition
various bug fixes

V 3.0.1 vs 3.0

Fix in the cmake files which resulted in faulty behavior of the install step in Windows systems

V 3.0 vs 2.0

Support for Nvidia GPUs through the StarPU runtime system.
Cholesky factorization for solving symmetric positive definite systems.
Dynamic, hierarchical partitioning for the QR factorization.
Switched to cmake for the build process.
switched to fstarpu_mod StarPU module instead of hand-made interfaces and wrappers.
Environment variables to set default values for all control parameters.

V 2.0

Version 2.0 is an almost complete rewrite of the qr_mumps package. Here are some of the main changes wrt previous versions

Parallelism is now achieved using the StarPU runtime engine.
2D block partitioning can be used for frontal matrices in combination with communication avoiding dense factorization algorithms
it is possible to bound the memory consumption of the parallel factorization phase
pipelining of operation can be achieved through the asynchronous API
the error handling has been deeply modified to make it thread-safe

V 1.2

Added a method to extract the R factor once the factorization is computed.

V 1.1

There is no limit to the number of concurrent instances of qrm_spmat_c in the C interface
A number of minor bugfixes